智能论文笔记

SHLE: Devices Tracking and Depth Filtering for Stereo-based Height Limit Estimation

Zhaoxin Fan , Kaixing Yang , Min Zhang , Zhenbo Song , Hongyan Liu , Jun He

分类：计算机视觉

2022-12-22

Recently, over-height vehicle strike frequently occurs, causing great economic cost and serious safety problems. Hence, an alert system which can accurately discover any possible height limiting devices in advance is necessary to be employed in modern large or medium sized cars, such as touring cars. Detecting and estimating the height limiting devices act as the key point of a successful height limit alert system. Though there are some works research height limit estimation, existing methods are either too computational expensive or not accurate enough. In this paper, we propose a novel stereo-based pipeline named SHLE for height limit estimation. Our SHLE pipeline consists of two stages. In stage 1, a novel devices detection and tracking scheme is introduced, which accurately locate the height limit devices in the left or right image. Then, in stage 2, the depth is temporally measured, extracted and filtered to calculate the height limit device. To benchmark the height limit estimation task, we build a large-scale dataset named "Disparity Height", where stereo images, pre-computed disparities and ground-truth height limit annotations are provided. We conducted extensive experiments on "Disparity Height" and the results show that SHLE achieves an average error below than 10cm though the car is 70m away from the devices. Our method also outperforms all compared baselines and achieves state-of-the-art performance. Code is available at https://github.com/Yang-Kaixing/SHLE.

translated by 谷歌翻译

FuRPE: Learning Full-body Reconstruction from Part Experts

Zhaoxin Fan , Yuqing Pan , Hao Xu , Zhenbo Song , Zhicheng Wang , Kejian Wu , Hongyan Liu , Jun He

分类：计算机视觉

2022-11-30

Full-body reconstruction is a fundamental but challenging task. Owing to the lack of annotated data, the performances of existing methods are largely limited. In this paper, we propose a novel method named Full-body Reconstruction from Part Experts~(FuRPE) to tackle this issue. In FuRPE, the network is trained using pseudo labels and features generated from part-experts. An simple yet effective pseudo ground-truth selection scheme is proposed to extract high-quality pseudo labels. In this way, a large-scale of existing human body reconstruction datasets can be leveraged and contribute to the model training. In addition, an exponential moving average training strategy is introduced to train the network in a self-supervised manner, further boosting the performance of the model. Extensive experiments on several widely used datasets demonstrate the effectiveness of our method over the baseline. Our method achieves the state-of-the-art performance. Code will be publicly available for further research.

translated by 谷歌翻译

GIDP: Learning a Good Initialization and Inducing Descriptor Post-enhancing for Large-scale Place Recognition

Zhaoxin Fan , Zhenbo Song , Hongyan Liu , Jun He

分类：计算机视觉

2022-09-23

大规模的地方认可是一项基本但具有挑战性的任务，在自主驾驶和机器人技术中起着越来越重要的作用。现有的方法已经达到了可接受的良好性能，但是，其中大多数都集中精力设计精美的全球描述符学习网络结构。长期以来忽略了特征概括和描述后的特征概括和描述符的重要性。在这项工作中，我们提出了一种名为GIDP的新方法，以学习良好的初始化并引起描述符，以供大规模识别。特别是，在GIDP中分别提出了无监督的动量对比度云预处理模块和基于重新的描述符后增强模块。前者旨在在训练位置识别模型之前对Point Cloud编码网络进行良好的初始化，而后来的目标是通过推理时间重新掌握预测的全局描述符。在室内和室外数据集上进行的广泛实验表明，我们的方法可以使用简单和一般的点云编码主干来实现最先进的性能。

translated by 谷歌翻译

Human Pose Driven Object Effects Recommendation

Zhaoxin Fan , Fengxin Li , Hongyan Liu , Jun He , Xiaoyong Du

分类：计算机视觉 | 人工智能

2022-09-17

在本文中，我们研究了Micro-Video平台中的对象效果建议的新主题，这对于许多实际应用（例如广告插入）来说是一项具有挑战性但重要的任务。为了避免引入由图像框架直接学习视频内容引起的背景偏见的问题，我们建议利用3D人类姿势中隐藏的有意义的肢体语言进行推荐。为此，在这项工作中，引入了一种新型的人类姿势驱动的对象效应建议网络称为poserec。 Poserec利用了3D人姿势检测的优势，并从多框架3D人姿势中学习信息进行视频项目注册，从而导致高质量的对象效应建议性能。此外，为了解决对象效应建议中存在的固有的歧义和稀疏性问题，我们进一步提出了一种新颖的物品感知的隐性原型学习模块，并提供了一种新颖的姿势感知的托管性托管性硬性阴性挖掘模块，以更好地学习姿势 - 项目。更重要的是，为了为新研究主题进行基准方法，我们构建了一个新数据集，用于对象效果建议，名为Pose-Obe。对姿势攻击的广泛实验表明，我们的方法比强基础可以取得更高的性能。

translated by 谷歌翻译

Learning Large-scale Universal User Representation with Sparse Mixture of Experts

Caigao Jiang , Siqiao Xue , James Zhang , Lingyue Liu , Zhibo Zhu , Hongyan Hao

分类：机器学习 | 自然语言处理

2022-07-11

学习用户序列行为嵌入非常复杂且充满挑战，因为随着时间的推移和用户功能的高尺寸，功能相互作用复杂。最近的新兴基金会模型，例如伯特及其变体，鼓励大量研究人员在该领域进行调查。但是，与自然语言处理（NLP）任务不同，用户行为模型的参数主要来自用户嵌入层，这使得大多数现有作品在训练大规模的通用用户嵌入中失败。此外，从多个下游任务中学到了用户表示，并且过去的研究工作无法解决Seesaw现象。在本文中，我们提出了SuperMoe，这是一个通用框架，旨在从多个任务中获取高质量的用户表示。具体而言，用户行为序列是由MOE Transformer编码的，因此我们可以将模型容量提高到数十亿个参数，甚至可以将模型能力提高到数万亿个参数。为了在跨多个任务学习时处理Seesaw现象，我们使用任务指标设计了新的损失功能。我们在公共数据集和私人现实世界业务方案上进行了广泛的离线实验。我们的方法在最新模型上取得了最佳性能，结果证明了我们框架的有效性。

translated by 谷歌翻译

Object Level Depth Reconstruction for Category Level 6D Object Pose Estimation From Monocular RGB Image

Zhaoxin Fan , Zhenbo Song , Jian Xu , Zhicheng Wang , Kejian Wu , Hongyan Liu , Jun He

分类：计算机视觉

2022-04-04

最近，基于RGBD的类别级别6D对象姿势估计已实现了有希望的性能提高，但是，深度信息的要求禁止更广泛的应用。为了缓解这个问题，本文提出了一种新的方法，名为“对象级别深度重建网络”（旧网）仅将RGB图像作为类别级别6D对象姿势估计的输入。我们建议通过将类别级别的形状在对象级深度和规范的NOC表示中直接从单眼RGB图像中直接预测对象级的深度。引入了两个名为归一化的全局位置提示（NGPH）和形状吸引的脱钩深度重建（SDDR）模块的模块，以学习高保真对象级的深度和精致的形状表示。最后，通过将预测的规范表示与背面预测的对象级深度对齐来解决6D对象姿势。在具有挑战性的Camera25和Real275数据集上进行了广泛的实验，表明我们的模型虽然很简单，但可以实现最先进的性能。

translated by 谷歌翻译

ACR-Pose: Adversarial Canonical Representation Reconstruction Network for Category Level 6D Object Pose Estimation

Zhaoxin Fan , Zhengbo Song , Jian Xu , Zhicheng Wang , Kejian Wu , Hongyan Liu , Jun He

分类：计算机视觉 | 人工智能

2021-11-20

最近，随着重建规范3D表示的发展，类别级别的6D对象姿态估计已经取得了显着的改进。然而，现有方法的重建质量仍远非优秀。在本文中，我们提出了一种名为ACR-POSE的新型对抗性规范代表性重建网络。 ACR-POSE由重建器和鉴别器组成。重建器主要由两种新型子模块组成：姿势 - 无关模块（PIM）和关系重建模块（RRM）。 PIM倾向于学习Canonical相关的功能，使重建者对旋转和翻译不敏感，而RRM探讨不同输入模态之间的基本关系信息以产生高质量功能。随后，采用鉴别器来指导重建器以产生现实的规范表示。重构和鉴别者学会通过对抗性培训进行优化。普遍的NOCS相机和NOCS实际数据集的实验结果表明，我们的方法实现了最先进的性能。

translated by 谷歌翻译

RPR-Net: A Point Cloud-based Rotation-aware Large Scale Place Recognition Network

Zhaoxin Fan , Zhenbo Song , Wenping Zhang , Hongyan Liu , Jun He , Xiaoyong Du

分类：计算机视觉

2021-08-29

对于许多应用程序，例如同时本地化和映射（SLAM），基于点云的大规模识别是一项重要但具有挑战性的任务。以任务为云检索问题，以前的方法取得了令人愉快的成就。但是，如何处理由旋转问题引起的灾难性崩溃仍然不足。在本文中，为了解决这个问题，我们提出了一个基于点云的新型旋转型大型位置识别网络（RPR-NET）。特别是，为了解决问题，我们建议分三个步骤学习旋转不变的功能。首先，我们设计了三种新型的旋转不变特征（RIF），它们是可以保持旋转不变属性的低级特征。其次，使用这些Rifs，我们设计了一个细心的模块来学习旋转不变的内核。第三，我们将这些内核应用于先前的点云功能，以生成新功能，这是众所周知的SO（3）映射过程。通过这样做，可以学习高级场景特定的旋转不变功能。我们将上述过程称为细心的旋转不变卷积（ARICONV）。为了实现位置识别目标，我们构建了RPR-NET，它将Ariconv作为构建密集网络体系结构的基本单元。然后，可以从RPR-NET中充分提取用于基于检索的位置识别的强大全局描述符。普遍数据集的实验结果表明，我们的方法可以在解决旋转问题时显着优于现有的最新位置识别模型的可比结果，并显着优于其他旋转不变的基线模型。

translated by 谷歌翻译

SVT-Net: Super Light-Weight Sparse Voxel Transformer for Large Scale Place Recognition

Zhaoxin Fan , Zhenbo Song , Hongyan Liu , Zhiwu Lu , Jun He , Xiaoyong Du

分类：计算机视觉

2021-05-01

基于点云的大规模地位识别对于许多应用程序，如同时本地化和映射（SLAM）等许多应用是基础的。虽然已经提出了许多模型并通过学习短程局部特征而实现了良好的性能，但往往忽略了远程语境特性。此外，模型大小也已成为其广泛应用的瓶颈。为了克服这些挑战，我们提出了一个超级轻型网络模型，被称为SVT-Net，用于大规模识别。具体地，在高效的3D稀疏卷积（SP-CONV）之上，提出了一种基于原子的稀疏体变压器（ASVT）和基于簇的稀疏体变压器（CSVT），以学习短程局部特征和长期 - 此模型中的上下文功能。由ASVT和CSVT组成，SVT-NET可以在基准数据集中实现最先进的，其精度和速度都具有超光模型尺寸（0.9M）。同时，引入了两种简化的SVT-NET版本，也实现了最先进的，进一步降低了模型尺寸至0.8米和0.4米。

translated by 谷歌翻译

Towards Efficient and Domain-Agnostic Evasion Attack with High-dimensional Categorical Inputs

Hongyan Bao , Yufei Han , Yujun Zhou , Xin Gao , Xiangliang Zhang

分类：机器学习 | 人工智能

2022-12-13

Our work targets at searching feasible adversarial perturbation to attack a classifier with high-dimensional categorical inputs in a domain-agnostic setting. This is intrinsically an NP-hard knapsack problem where the exploration space becomes explosively larger as the feature dimension increases. Without the help of domain knowledge, solving this problem via heuristic method, such as Branch-and-Bound, suffers from exponential complexity, yet can bring arbitrarily bad attack results. We address the challenge via the lens of multi-armed bandit based combinatorial search. Our proposed method, namely FEAT, treats modifying each categorical feature as pulling an arm in multi-armed bandit programming. Our objective is to achieve highly efficient and effective attack using an Orthogonal Matching Pursuit (OMP)-enhanced Upper Confidence Bound (UCB) exploration strategy. Our theoretical analysis bounding the regret gap of FEAT guarantees its practical attack performance. In empirical analysis, we compare FEAT with other state-of-the-art domain-agnostic attack methods over various real-world categorical data sets of different applications. Substantial experimental observations confirm the expected efficiency and attack effectiveness of FEAT applied in different application scenarios. Our work further hints the applicability of FEAT for assessing the adversarial vulnerability of classification systems with high-dimensional categorical inputs.

translated by 谷歌翻译